Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques

نویسندگان

Thomas Pellegrini

Sandrine Mouysset

چکیده

Today’s state-of-art in speech recognition involves deep neural networks (DNN). These last years, a certain research effort has been invested in characterizing the feature representations learned by DNNs. In this paper, we focus on convolutional neural networks (CNN) trained for phoneme recognition in French. We report clustering experiments performed on activation maps extracted from the different layers of a CNN comprised of two convolution and sub-sampling layers followed by three dense layers. Our goal was to get insights into phone separability and phonemic categories inferred by the network, and how they vary according to the successive layers. Two directions were explored with both linear and non-linear clustering techniques. First, we imposed a number of 33 classes equal to the number of context-independent phone models for French, in order to assess the phoneme separability power of the different layers. As expected, we observed that this power increases with the layer depth in the network: from 34% to 74% in F-measure from the first convolution to the last dense layers, when using spectral clustering. Second, optimal numbers of classes were automatically inferred through interand intra-cluster measure criteria. We analyze these classes in terms of standard French phonological features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Organ Segmentation in Poultry Viscera Using RGB-D

We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps ...

متن کامل

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

We consider the task of visual net surgery, in which a CNN can be reconfigured without extra data to recognize novel concepts that may be omitted from the training set. While most prior work make use of linguistic cues for such ”zero-shot” learning, we do so by using a pictorial language representation of the training set, implicitly learned by a CNN, to generalize to new classes. To this end, ...

متن کامل

Distinct Class Saliency Maps for Multiple Object Images

This paper proposes a method to obtain more distinct class saliency maps than Simonyan et al. (2014). We made three improvements over their method: (1) using CNN derivatives with respect to feature maps of the intermediate convolutional layers with up-sampling instead of an input image; (2) subtracting saliency maps of the other classes from saliency maps of the target class to differentiate ta...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques

نویسندگان

چکیده

منابع مشابه

Learning Document Image Features With SqueezeNet Convolutional Neural Network

Organ Segmentation in Poultry Viscera Using RGB-D

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Tinkering Under the Hood: Interactive Zero-Shot Learning with Net Surgery

Distinct Class Saliency Maps for Multiple Object Images

عنوان ژورنال:

اشتراک گذاری